[0fc53f]: / Notebook / Week 1 / load_data.ipynb

Download this file

2050 lines (2049 with data), 81.4 kB

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Introduction about dataset"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p><b>Intracranial hemorrhage🧠 (ICH)</b> is caused by bleeding within the brain tissue itself — a life-threatening type of stroke. A stroke occurs when the brain is deprived of oxygen and blood supply. ICH is most commonly caused by hypertension, arteriovenous malformations, or head trauma. Treatment focuses on stopping the bleeding, removing the blood clot (hematoma), and relieving the pressure on the brain.</p>\n",
    "<br/><br/>\n",
    "<p><b>Diagnosis</b> requires an urgent procedure. When a patient shows acute neurological symptoms such as severe headache or loss of consciousness, highly trained specialists review medical images of the patient’s cranium to look for the presence, location and type of hemorrhage. The process is complicated and often time consuming.</p>\n",
    "<br/><br/>\n",
    "<p>The current clinical protocol to diagnose Intracranial hemorrhage🧠 ICH is examining Computerized Tomography (CT) scans by radiologists to detect ICH and localize its regions. However, this process relies heavily on the availability of an experienced radiologist.CT images are examined by an expert radiologist to determine whether ICH has occurred and if so, detect its type and region. However, this diagnosis process relies on the availability of a subspecialty-trained neuroradiologist, and as a result, could be time inefficient and even inaccurate, especially in remote areas where specialized care is scarce.</p>\n",
    "<br/><br/>\n",
    "<p>In Recent years ,the Advancement in <b>Deep learning</b> has enable us to solve various problem, even in some cases it shows us better results than humans.we will try to solve Intracranical hemorrhage detection and segmentation using CT scan dataset of brain which is annoted by expert radiologists. </p>\n",
    "<p>The challenge is to build an algorithm to detect acute intracranial hemorrhage and its subtypes.</p>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    ">Intraparenchymal hemorrhage is blood that is located completely within the brain itself; intraventricular or subarachnoid hemorrhage is blood that has leaked into the spaces of the brain that normally contain cerebrospinal fluid (the ventricles or subarachnoid cisterns). Extra-axial hemorrhages are blood that collects in the tissue coverings that surround the brain (e.g. subdural or epidural subtypes). ee figure.) Patients may exhibit more than one type of cerebral hemorrhage, which c may appear on the same image. While small hemorrhages are less morbid than large hemorrhages typically, even a small hemorrhage can lead to death because it is an indicator of another type of serious abnormality (e.g. cerebral aneurysm).\n",
    ">\n",
    "> #### There are four types of ICH:\n",
    ">    * **Intraparenchymal hemorrhage**\n",
    ">    * **Epidural hemorrhage**\n",
    ">    * **Subdural hemorrhage**\n",
    ">    * **Subarachnoid hemorrhage**\n",
    ">    * **intraventricular hemorrhage**\n",
    ">\n",
    "> one patient can exibits more than one type of hemorrhage"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "![file](https://user-images.githubusercontent.com/58046531/89164136-4eac1d00-d594-11ea-9408-6d271518b3a7.png)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This datset have six classes \n",
    "1. any - any of five class of hemorrhage is present or not in patient\n",
    "2. epidural\n",
    "3. intraparenchymal\n",
    "4. intraventricular \n",
    "5. subarachnoid\n",
    "6. subdural\n",
    "\n",
    "It is possible that one patient have more than type of hemorrhage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "base_url = '~/kaggle/rsna-intracranial-hemorrhage-detection/'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "TRAIN_DIR = '/home/ubuntu/kaggle/rsna-intracranial-hemorrhage-detection/stage_2_train'\n",
    "TEST_DIR = '/home/ubuntu/kaggle/rsna-intracranial-hemorrhage-detection/stage_2_test'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 58,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os\n",
    "import pandas as pd\n",
    "import swifter\n",
    "import numpy as np\n",
    "from tqdm import *\n",
    "import re\n",
    "import seaborn as sns\n",
    "import pydicom\n",
    "import joblib"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Requirement already satisfied: pydicom in /home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages (2.0.0)\n",
      "Note: you may need to restart the kernel to use updated packages.\n"
     ]
    }
   ],
   "source": [
    "pip install pydicom"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "752803\r\n"
     ]
    }
   ],
   "source": [
    "! ls {TRAIN_DIR} | wc -l\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "121232\r\n"
     ]
    }
   ],
   "source": [
    "! ls {TEST_DIR} | wc -l"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "ID_000012eaf.dcm\n",
      "ID_000039fa0.dcm\n",
      "ID_00005679d.dcm\n",
      "ID_00008ce3c.dcm\n",
      "ID_0000950d7.dcm\n",
      "ls: write error: Broken pipe\n"
     ]
    }
   ],
   "source": [
    "! ls {TRAIN_DIR} | head -n 5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(4516842, 2)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>Label</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>ID_12cadc6af_epidural</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ID_12cadc6af_intraparenchymal</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ID_12cadc6af_intraventricular</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ID_12cadc6af_subarachnoid</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ID_12cadc6af_subdural</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>ID_12cadc6af_any</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>ID_38fd7baa0_epidural</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>ID_38fd7baa0_intraparenchymal</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>ID_38fd7baa0_intraventricular</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>ID_38fd7baa0_subarachnoid</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                              ID  Label\n",
       "0          ID_12cadc6af_epidural      0\n",
       "1  ID_12cadc6af_intraparenchymal      0\n",
       "2  ID_12cadc6af_intraventricular      0\n",
       "3      ID_12cadc6af_subarachnoid      0\n",
       "4          ID_12cadc6af_subdural      0\n",
       "5               ID_12cadc6af_any      0\n",
       "6          ID_38fd7baa0_epidural      0\n",
       "7  ID_38fd7baa0_intraparenchymal      0\n",
       "8  ID_38fd7baa0_intraventricular      0\n",
       "9      ID_38fd7baa0_subarachnoid      0"
      ]
     },
     "execution_count": 24,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_df = pd.read_csv(base_url+'stage_2_train.csv')\n",
    "print(train_df.shape)\n",
    "train_df.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(4516842, 3)\n"
     ]
    }
   ],
   "source": [
    "train_df[['ID', 'Subtype']] = train_df['ID'].str.rsplit(pat='_', n=1, expand=True)\n",
    "print(train_df.shape)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Here if we look then we find that each image have output for each class as True(1) or False(0) mean single image have six duplicate image.so we will convert them into one_hot_encoder and then single image will have single row."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th>Label</th>\n",
       "      <th>Subtype</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4516836</th>\n",
       "      <td>ID_4a85a3a3f</td>\n",
       "      <td>0</td>\n",
       "      <td>epidural</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4516837</th>\n",
       "      <td>ID_4a85a3a3f</td>\n",
       "      <td>0</td>\n",
       "      <td>intraparenchymal</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4516838</th>\n",
       "      <td>ID_4a85a3a3f</td>\n",
       "      <td>0</td>\n",
       "      <td>intraventricular</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4516839</th>\n",
       "      <td>ID_4a85a3a3f</td>\n",
       "      <td>0</td>\n",
       "      <td>subarachnoid</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4516840</th>\n",
       "      <td>ID_4a85a3a3f</td>\n",
       "      <td>0</td>\n",
       "      <td>subdural</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4516841</th>\n",
       "      <td>ID_4a85a3a3f</td>\n",
       "      <td>0</td>\n",
       "      <td>any</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   ID  Label           Subtype\n",
       "4516836  ID_4a85a3a3f      0          epidural\n",
       "4516837  ID_4a85a3a3f      0  intraparenchymal\n",
       "4516838  ID_4a85a3a3f      0  intraventricular\n",
       "4516839  ID_4a85a3a3f      0      subarachnoid\n",
       "4516840  ID_4a85a3a3f      0          subdural\n",
       "4516841  ID_4a85a3a3f      0               any"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_df.tail(6)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [],
   "source": [
    "def fix_id(img_id, img_dir=TRAIN_DIR):\n",
    "    if not re.match(r'ID_[a-z0-9]+', img_id):\n",
    "        sop = re.search(r'[a-z0-9]+', img_id)\n",
    "        if sop:\n",
    "            img_id_new = f'ID_{sop[0]}'\n",
    "            return img_id_new\n",
    "        else:\n",
    "            print(img_id)\n",
    "    return img_id"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "0          ID_12cadc6af\n",
       "1          ID_12cadc6af\n",
       "2          ID_12cadc6af\n",
       "3          ID_12cadc6af\n",
       "4          ID_12cadc6af\n",
       "               ...     \n",
       "4516837    ID_4a85a3a3f\n",
       "4516838    ID_4a85a3a3f\n",
       "4516839    ID_4a85a3a3f\n",
       "4516840    ID_4a85a3a3f\n",
       "4516841    ID_4a85a3a3f\n",
       "Name: ID, Length: 4516842, dtype: object"
      ]
     },
     "execution_count": 28,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_df['ID'].apply(fix_id)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "(752803, 7)\n"
     ]
    },
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th colspan=\"6\" halign=\"left\">Label</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subtype</th>\n",
       "      <th></th>\n",
       "      <th>any</th>\n",
       "      <th>epidural</th>\n",
       "      <th>intraparenchymal</th>\n",
       "      <th>intraventricular</th>\n",
       "      <th>subarachnoid</th>\n",
       "      <th>subdural</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>ID_000012eaf</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ID_000039fa0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ID_00005679d</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ID_00008ce3c</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ID_0000950d7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   ID Label                                             \\\n",
       "Subtype                 any epidural intraparenchymal intraventricular   \n",
       "0        ID_000012eaf     0        0                0                0   \n",
       "1        ID_000039fa0     0        0                0                0   \n",
       "2        ID_00005679d     0        0                0                0   \n",
       "3        ID_00008ce3c     0        0                0                0   \n",
       "4        ID_0000950d7     0        0                0                0   \n",
       "\n",
       "                               \n",
       "Subtype subarachnoid subdural  \n",
       "0                  0        0  \n",
       "1                  0        0  \n",
       "2                  0        0  \n",
       "3                  0        0  \n",
       "4                  0        0  "
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_new = train_df.pivot_table(index='ID', columns='Subtype').reset_index()\n",
    "print(train_new.shape)\n",
    "train_new.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Subtype\n",
      "any                 107933\n",
      "epidural              3145\n",
      "intraparenchymal     36118\n",
      "intraventricular     26205\n",
      "subarachnoid         35675\n",
      "subdural             47166\n",
      "dtype: int64\n"
     ]
    }
   ],
   "source": [
    "subtype_ct = train_new['Label'].sum(axis=0)\n",
    "print(subtype_ct)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Distribution of each type of Hemorrhage"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAcoAAAD4CAYAAABsWabOAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjMsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+AADFEAAAa6klEQVR4nO3deZhlVX3u8e8rgw3diEwqitigBEVl6oKAIoOiURNjjNygwaBo5N4YJUbRq1FRNMaHGI0XcaCNggPhooATPipGAQUCTTdTN7MDKIErwaEFBGngd/84q+R0WbXrNFbVqSq+n+c5T+299tp7rXV2db29h7NPqgpJkjS+hwy7A5IkzWYGpSRJHQxKSZI6GJSSJHUwKCVJ6rD+sDugqbflllvW4sWLh90NSZpTVqxYcWtVbTW23KCchxYvXszy5cuH3Q1JmlOS3DBeuadeJUnqYFBKktTBU6/z0FU3/owlb/rMsLshSTNqxfsPnZbtekQpSVIHg1KSpA4GpSRJHQxKSZI6GJSSJHUwKCVJ6mBQSpLUwaCUJKmDQSlJUgeDUpKkDgalJEkdDEpJkjoYlEOQ5EtJViS5Isnhrez2JO9NclmSC5I8MskmSX6UZINW52FJrh+dlyRNP4NyOF5ZVUuAEeCIJFsAC4ELqmoX4LvAq6vqNuBs4I/bei8BTquqNWM3mOTwJMuTLL/n17fNyCAk6cHAoByOI5JcBlwAPBbYAbgbOKMtXwEsbtP/BhzWpg8DThhvg1W1tKpGqmpk/Y03ma5+S9KDjt9HOcOS7A8cCOxdVb9OcjawAFhTVdWq3UvbN1V1XpLFSfYD1quqVUPotiQ9aHlEOfM2BX7RQvKJwF4DrPMZ4GQmOJqUJE0fg3LmfQNYP8nlwHvonX6dzEnAZvTCUpI0gzz1OsOq6jfA88ZZtKivzqnAqX3L9gFOrapfTnP3JEljGJSzXJIP0wvW5w+7L5L0YGRQznJV9bph90GSHsy8RilJUgeDUpKkDgalJEkdDEpJkjoYlJIkdTAoJUnqYFBKktTBz1HOQ0/aZguWv//QYXdDkuYFjyglSepgUEqS1MGglCSpg0EpSVIHg1KSpA4GpSRJHfx4yDx0981X8ON3P7WzzrZHrZyh3kjS3OYRpSRJHQxKSZI6GJSSJHUwKCVJ6mBQSpLUwaCUJKmDQSlJUgeDUpKkDgalJEkdDEpJkjoYlJIkdTAop0CSdyc5cJzy/ZOcMYXtnJ1kZKq2J0manA9FnwJVddRUbCdJgFTVfVOxPUnS788jygkkeVmSZUkuTXJ8kvWS3J7kA0kuTvLtJFu1uicmOahNPzfJ1UnOBf68b3vvSnJk3/yqJIvb66okHwUuBh6b5GNJlie5IsnRMzx0SVIfg3IcSZ4EHAw8vap2Be4FDgEWAhdX1e7AOcA7x6y3APgE8ALgGcCjBmxyR+AzVbVbVd0AvK2qRoCdgf2S7DxAnw9v4br853fcO2CzkqTJGJTjexawBLgoyaVtfnvgPuCUVudzwD5j1nsi8KOquq6qqtUZxA1VdUHf/F8kuRi4BHgysNNkG6iqpVU1UlUjmy9cb8BmJUmT8Rrl+AJ8uqreulZh8o4x9WqcdccrA7iHtf9jsqBv+o6+NrYDjgT2qKpfJDlxTF1J0gzyiHJ83wYOSvIIgCSbJ3kcvffroFbnL4Fzx6x3NbBdkse3+Zf2Lbse2L1tb3dguwnafhi94Fyd5JHA836/oUiSfh8eUY6jqq5M8nbgzCQPAdYAf0svwJ6cZAWwmt51zP717kpyOPC1JLfSC9KntMWnAYe2U7kXAddO0PZlSS4BrgB+CJw35QOUJA0svUtpGkSS26tq0bD7MZmdH7NRnfE/n9BZZ9ujVs5QbyRpbkiyot1IuRZPvUqS1MGgXAdz4WhSkjS1DEpJkjoYlJIkdTAoJUnqYFBKktTBoJQkqYNBKUlSB5/MMw9tuPWT2fao5cPuhiTNCx5RSpLUwaCUJKmDQSlJUgeDUpKkDgalJEkdDEpJkjr48ZB56OpbrubpH376sLuhAZz3Or+XW5rtPKKUJKmDQSlJUgeDUpKkDgalJEkdDEpJkjoYlJIkdTAoJUnqYFBKktTBoJQkqYNBKUlSB4NSkqQO0xaUSc4foM7rk2w8XX2Ybkn2T3LGDLa3OMmqmWpPkjSNQVlVTxug2uuBcYMyyXpT26PfbtcHwUuSBjadR5S3t5/7Jzk7yalJrk5yUnqOAB4NnJXkrNF1krw7yYXA3kmOSnJRklVJliZJq3d2kg8lOb8t27OV79nKLmk/d2zlr0jyhSRfBc5sZW9q2748ydGtbHGSq5J8IskVSc5MslFb9oQk/5HksiQXJ3l8G+qiccb2rCRf7Hsvnp3k9L4xHpNkRdvenm08P0zyp339+F5r5+Ikg/ynQ5I0DQYOyiT7JDmsTW+VZLt1aGc3ekePOwHbA0+vqmOBm4ADquqAVm8hsKqq/rCqzgWOq6o9quopwEbAn/Rtc2E7an0N8KlWdjWwb1XtBhwF/FNf/b2Bl1fVM5M8B9gB2BPYFViSZN9WbwfgI1X1ZOCXwItb+UmtfBfgacDNE40N+A7wpCRbtTqHASf0jfHsqloC3Ab8I/Bs4EXAu1udW4BnV9XuwMHAsd1vLyQ5PMnyJMvX3L5msuqSpAENdBoyyTuBEWBHen/wNwA+Ry8UBrGsqm5s27oUWAycO069e4HT+uYPSPJmeqdnNweuAL7alp0MUFXfTfKwJA8HNgE+nWQHoFo/R32rqn7epp/TXpe0+UX0AvLHwI+q6tJWvgJYnGQT4DFV9cXW5l1tLOOOrarOTfJZ4GVJTqAX0oe2bd4NfKNNrwR+U1Vrkqxs7wut38cl2bW9J38wznu1lqpaCiwFWLTtopqsviRpMINer3sRvSOniwGq6qYWHoP6Td/0vR3t3lVV9wIkWQB8FBipqp8keRewoK/u2DAo4D3AWVX1oiSLgbP7lt/RNx3gfVV1fP8G2jpj+7pRqz+RicZ2Ar1Qvwv4QlXd08rXVNVo3+8bXb+q7uu7fvr3wE+BXegd9d/V0b4kaRoNeur17vbHvQCSLJyi9m+jdxQ4ntFQvDXJIuCgMcsPbn3ZB1hdVauBTYH/astf0dHuN4FXtu2S5DFJHjFR5ar6FXBjkj9r9R862d26VXUTvVPLbwdO7Ko7jk2Bm6vqPuCvgGm5sUmSNLlBg/LzSY4HHp7k1cB/AJ+YgvaXAl8fvZmnX1X9srWxEvgScNGYKr9oH0H5OPCqVvbPwPuSnEdHuFTVmcC/A//ZTnmeysSBPeqvgCOSXA6cDzxqkvrQu675k6q6coC6/T4KvDzJBfROu94xSX1J0jTJ/WcBJ6mYPJvedT2AM6vqW9PWq8n7cjZwZFUtH1YfBpHkOOCSqvrkTLa7aNtFtcubdpnJJvUAnfe684bdBUlNkhVVNTK2fF0+U7iS3vW6atPqkGQFvSPBNw67L5KkB26gU69J/hpYBvw5vWuFFyR55XR2rEtV7T/bjyaraklV7VtVv5m8tiRpthr0iPJNwG5V9TOAJFvQu073qc61JEma4wa9medGeneojroN+MnUd0eSpNll0CPK/wIuTPJletcoXwgsS/IGgKr64DT1T5KkoRo0KH/QXqO+3H6uy0MHJEmacwYNytOqyq93kiQ96Ax6jfLjSZYleU17pqokSQ8KAx1RVtU+Sf6A3rdgLE+yDDixPeFGs8wTH/FEP8guSVNk4K/Zqqpr6T239H8D+wH/p30H459PV+ckSRq2QR84sHOSfwWuAp4JvKCqntSm/3Ua+ydJ0lANejPPcfQeUP4PVXXnaGH7uq23T0vPJEmaBQY99Xp6VX22PyST/B1AVX12WnomSdIsMGhQHjpO2SumsB+SJM1Knadek7wU+EtguyRf6Vu0CfCz6eyYJEmzwWTXKM8Hbga2BD7QV34bcPl0dUqSpNmiMyir6gbgBmDvJI8C9qT3rNdrquqeGeifHoDbrrmGc/bdb9jdmFP2++45w+6CpFlq0I+HvIpZ9H2UkiTNlEE/HvJm/D5KSdKDkN9HKUlSh8nuen1Dmxz3+yinuW+SJA3dZKdeR79vcqLvo5QkaV6b7K7Xo2eqI5IkzUYD3cyT5Cx6p1zXUlXPnPIeSZI0iwx61+uRfdMLgBcDfo5SkjTvDfrFzSvGFJ2XxE9oS5LmvUFPvW7eN/sQYAR41LT0SJKkWWTQz1GuAJa31/nAG4BXTVUnkpw/QJ3XJ9l4qtpcV0l2TfL8juUjSY59gNt+V5IjJ68pSZppnUGZZI8kj6qq7apqe+Bo4Or2unKqOlFVTxug2uuBcYMyyXpT1ZcOuwLjBmWS9atqeVUdMQP9IMmg15YlSb+nyY4ojwfuBkiyL/A+4NPAamDpVHUiye3t5/5Jzk5yapKrk5yUniOARwNntTtwSXJ7kncnuZDeQ9uPSnJRklVJlrb1npRkWV87i5Nc3qaXJDknyYok30yydSs/O8kxSZYluTbJM5JsCLwbODjJpUkObkeBS5OcCXym9f2Mto1FSU5IsjLJ5Ule3D/ONn1QkhPHeS9e3cZxWZLTRo+ik5yY5INt/MdM1XsvSeo2WVCuV1U/b9MHA0ur6rSqegfwhGnq0270jh53ArYHnl5VxwI3AQdU1QGt3kJgVVX9YVWdCxxXVXtU1VOAjYA/qaqrgA2TbN83hs8n2QD4MHBQVS2h98za9/b1Yf2q2rP1451VdTdwFHBKVe1aVae0ekuAF1bVX44ZwzuA1VX11KraGfjOOoz/9DaOXYCrWPsU9x8AB1bVG8eulOTwJMuTLF+9Zs06NCdJ6jJpUPad5nsWa//Bn67Tf8uq6saqug+4FFg8Qb17gdP65g9IcmGSlcAzgSe38s8Df9GmDwZOAXYEngJ8K8mlwNuBbfq2dXr7uaKjfYCvVNWd45QfCHxkdKaqftGxjbGekuR7bRyH9I0D4AtVde94K1XV0qoaqaqRTTfYYB2akyR1mSzsTgbOSXIrcCfwPYAkT6B3+nU6/KZv+l4m7uNdo6GRZAHwUWCkqn6S5F30Pu8JvWD8QpLTgaqq65I8FbiiqvaepA9d7QPcMUF5GOcBDWPKFoyzHOBE4M+q6rIkrwD2H6A9SdI06TyirKr3Am+k98d7n6oa/UP/EOB109u133Eb9z97dqzR0Lk1ySJ635kJQFX9gF7gvYNeaAJcA2yVZG+AJBsk6T9yW9f2xzoTeO3oTJLN2uRP23XThwAvmmDdTYCb2+nhQwZsT5I0TSb9eEhVXVBVX6yqO/rKrq2qi6e3a79jKfD10Zt5+lXVL4FPACuBLwEXjalyCvAyeqdhadccDwKOSXIZvVO8k915exaw0+jNPJPU/Udgs3Zj0WXA6HXVtwBn0DuFffME674DuBD4Fr27iyVJQ5T7DxI1X+y4ySa1dLfdh92NOWW/7/qgKenBLsmKqhoZWz7oAwckSXpQMiglSepgUEqS1MGglCSpg0EpSVIHg1KSpA4GpSRJHQxKSZI6GJSSJHXwC4DnoU123NEnzUjSFPGIUpKkDgalJEkdDEpJkjoYlJIkdTAoJUnqYFBKktTBj4fMQ7fcuJrj3vjVYXdDHV77gRcMuwuSBuQRpSRJHQxKSZI6GJSSJHUwKCVJ6mBQSpLUwaCUJKmDQSlJUgeDUpKkDgalJEkdDEpJkjrM26BM8q4kR85ge69IctwUbet/JTl0nPLFSVZNRRuSpMH4rNdxJAmQqrpvGO1X1ceH0a4k6XfNqSPKJAuTfC3JZUlWJTk4yfVJtmzLR5Kc3bfKLkm+k+S6JK9udRYl+XaSi5OsTPLCVr44yVVJPgpcDDw2yceSLE9yRZKj+/qxR5LzWz+WJdmkLXp0km+09v65r/7tSd7b6l+Q5JGt/HGtL5e3n9u28t8eDSdZ0tb7T+Bvp+u9lSSNb04FJfBc4Kaq2qWqngJ8Y5L6OwN/DOwNHJXk0cBdwIuqanfgAOAD7QgSYEfgM1W1W1XdALytqkbadvZLsnOSDYFTgL+rql2AA4E72/q7AgcDTwUOTvLYVr4QuKDV/y7w6lZ+XGtvZ+Ak4NhxxnACcERV7d010CSHt1BffvuvV0/ytkiSBjXXgnIlcGCSY5I8o6omS4QvV9WdVXUrcBawJxDgn5JcDvwH8Bjgka3+DVV1Qd/6f5HkYuAS4MnATvTC9Oaqugigqn5VVfe0+t+uqtVVdRdwJfC4Vn43cEabXgEsbtN7A//epj8L7NPf+SSbAg+vqnP66oyrqpZW1UhVjSzaeNNJ3hZJ0qDm1DXKqro2yRLg+cD7kpwJ3MP9gb9g7CrjzB8CbAUsqao1Sa7vW++O0YpJtgOOBPaoql8kObHVyzjbHfWbvul7uf/9XVNVNU757wxxzHxXW5KkGTCnjijbqdNfV9XngH8BdgeuB5a0Ki8es8oLkyxIsgWwP3ARsClwSwvJA7j/qG+sh9ELztXtmuLzWvnV9K5F7tH6tEmSB/ofjvOBl7TpQ4Bz+xdW1S9b+/v01ZEkzaA5dURJ79rf+5PcB6wB/gbYCPhkkn8ALhxTfxnwNWBb4D1VdVOSk4CvJlkOXEov+H5HVV2W5BLgCuCHwHmt/O4kBwMfTrIRveuTBz7A8RwBfCrJm4D/Bg4bp85hrc6vgW8+wHYkSQ9Q7j8jqPli20ftUG8+5IPD7oY6vPYDLxh2FySNkWRFu4FzLXPq1KskSTPNoJQkqYNBKUlSB4NSkqQOBqUkSR0MSkmSOhiUkiR1MCglSepgUEqS1GGuPcJOA3jENpv65BdJmiIeUUqS1MGglCSpg0EpSVIHg1KSpA4GpSRJHQxKSZI6+PGQeejmH/2A977soGF3Q3PA2z536rC7IM16HlFKktTBoJQkqYNBKUlSB4NSkqQOBqUkSR0MSkmSOhiUkiR1MCglSepgUEqS1MGglCSpg0EpSVIHg/IBSvKuJEcOUO/EJFPy4NUki5OsmoptSZIGY1DOMkl8UL0kzSIGZZ8kC5N8LcllSVYlOTjJ9Um2bMtHkpzdt8ouSb6T5Lokr251kuS4JFcm+RrwiL7tj7utdnS6NMmZwGfakeP3klzcXk+bobdAkjSGRy9rey5wU1X9MUCSTYFjOurvDOwFLAQuacG4F7Aj8FTgkcCVwKcGaHsJsE9V3ZlkY+DZVXVXkh2Ak4GRrpWTHA4cDrDpxhsN0JwkaRAeUa5tJXBgkmOSPKOqVk9S/8tVdWdV3QqcBewJ7AucXFX3VtVNwHcGbPsrVXVnm94A+ESSlcAXgJ0mW7mqllbVSFWNLFzw0AGblCRNxiPKPlV1bZIlwPOB97VTofdw/38oFoxdZYL5seWjurZ1R9/03wM/BXZp9e8aaACSpCnnEWWfJI8Gfl1VnwP+BdgduJ7eaVGAF49Z5YVJFiTZAtgfuAj4LvCSJOsl2Ro4oK9+17b6bQrcXFX3AX8FrPdAxyRJ+v14RLm2pwLvT3IfsAb4G2Aj4JNJ/gG4cEz9ZcDXgG2B91TVTUm+CDyT3mnca4Fz+uof3bGtfh8FTkvyP+id0r2jo64kaRqlaqKzhJqrHrPFZvWa5z1r2N3QHPC2z5067C5Is0aSFVX1OzdOeupVkqQOBqUkSR0MSkmSOhiUkiR1MCglSepgUEqS1MGglCSpg0EpSVIHn8wzD2293eP9ILkkTRGPKCVJ6mBQSpLUwaCUJKmDD0Wfh5LcBlwz7H7MkC2BW4fdiRniWOcnxzp7PK6qthpb6M0889M14z0Bfz5Kstyxzj+OdX6aq2P11KskSR0MSkmSOhiU89PSYXdgBjnW+cmxzk9zcqzezCNJUgePKCVJ6mBQSpLUwaCcR5I8N8k1Sb6f5C3D7s+gkjw2yVlJrkpyRZK/a+WbJ/lWkuvaz81aeZIc28Z5eZLd+7b18lb/uiQv7ytfkmRlW+fYJJn5kd4vyXpJLklyRpvfLsmFrd+nJNmwlT+0zX+/LV/ct423tvJrkvxRX/ms+T1I8vAkpya5uu3fvefrfk3y9+33d1WSk5MsmE/7NcmnktySZFVf2bTvy4namFFV5WsevID1gB8A2wMbApcBOw27XwP2fWtg9za9CXAtsBPwz8BbWvlbgGPa9POBrwMB9gIubOWbAz9sPzdr05u1ZcuAvds6XweeN+QxvwH4d+CMNv954CVt+uPA37Tp1wAfb9MvAU5p0zu1ffxQYLu279ebbb8HwKeBv27TGwIPn4/7FXgM8CNgo779+Yr5tF+BfYHdgVV9ZdO+LydqY0bHPoxfKl/TsCN7v2Df7Jt/K/DWYffrAY7ly8Cz6T1daOtWtjW9BykAHA+8tK/+NW35S4Hj+8qPb2VbA1f3la9Vbwjj2wb4NvBM4Iz2h+FWYP2x+xL4JrB3m16/1cvY/Ttabzb9HgAPa+GRMeXzbr/SC8qftABYv+3XP5pv+xVYzNpBOe37cqI2ZvLlqdf5Y/Qf6qgbW9mc0k5B7QZcCDyyqm4GaD8f0apNNNau8hvHKR+WDwFvBu5r81sAv6yqe9p8f/9+O6a2fHWrv67vwTBsD/w3cEI7zfxvSRYyD/drVf0X8C/Aj4Gb6e2nFczP/dpvJvblRG3MGINy/hjv2syc+uxPkkXAacDrq+pXXVXHKasHUD7jkvwJcEtVregvHqdqTbJs1o+V3pHS7sDHqmo34A56p84mMmfH2q6bvZDe6dJHAwuB541TdT7s10HMq/EZlPPHjcBj++a3AW4aUl/WWZIN6IXkSVV1eiv+aZKt2/KtgVta+URj7SrfZpzyYXg68KdJrgf+L73Trx8CHp5k9NnL/f377Zja8k2Bn7Pu78Ew3AjcWFUXtvlT6QXnfNyvBwI/qqr/rqo1wOnA05if+7XfTOzLidqYMQbl/HERsEO7y25DejcIfGXIfRpIu7vtk8BVVfXBvkVfAUbvins5vWuXo+WHtjvr9gJWt1My3wSek2Sz9j/859C7rnMzcFuSvVpbh/Zta0ZV1VurapuqWkxvH32nqg4BzgIOatXGjnX0PTio1a9W/pJ29+R2wA70boaYNb8HVfX/gJ8k2bEVPQu4knm4X+mdct0rycatL6NjnXf7dYyZ2JcTtTFzZvqiqK/pe9G70+xaenfHvW3Y/VmHfu9D7zTL5cCl7fV8etdsvg1c135u3uoH+Egb50pgpG9brwS+316H9ZWPAKvaOscx5gaTIY17f+6/63V7en8Qvw98AXhoK1/Q5r/flm/ft/7b2niuoe9uz9n0ewDsCixv+/ZL9O50nJf7FTgauLr157P07lydN/sVOJne9dc19I4AXzUT+3KiNmby5SPsJEnq4KlXSZI6GJSSJHUwKCVJ6mBQSpLUwaCUJKmDQSlJUgeDUpKkDv8f5E4WM3lUSeYAAAAASUVORK5CYII=\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "sns.barplot(x=subtype_ct.values, y=subtype_ct.index);"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [],
   "source": [
    "def id_to_filepath(img_id, img_dir=TRAIN_DIR):\n",
    "    filepath = f'{img_dir}/{img_id}.dcm' # pydicom doesn't play nice with Path objects\n",
    "    if os.path.exists(filepath):\n",
    "        return filepath\n",
    "    else:\n",
    "        return 'DNE'"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 32,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th colspan=\"6\" halign=\"left\">Label</th>\n",
       "      <th>filepath</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subtype</th>\n",
       "      <th></th>\n",
       "      <th>any</th>\n",
       "      <th>epidural</th>\n",
       "      <th>intraparenchymal</th>\n",
       "      <th>intraventricular</th>\n",
       "      <th>subarachnoid</th>\n",
       "      <th>subdural</th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>ID_000012eaf</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ID_000039fa0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ID_00005679d</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ID_00008ce3c</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ID_0000950d7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   ID Label                                             \\\n",
       "Subtype                 any epidural intraparenchymal intraventricular   \n",
       "0        ID_000012eaf     0        0                0                0   \n",
       "1        ID_000039fa0     0        0                0                0   \n",
       "2        ID_00005679d     0        0                0                0   \n",
       "3        ID_00008ce3c     0        0                0                0   \n",
       "4        ID_0000950d7     0        0                0                0   \n",
       "\n",
       "                               \\\n",
       "Subtype subarachnoid subdural   \n",
       "0                  0        0   \n",
       "1                  0        0   \n",
       "2                  0        0   \n",
       "3                  0        0   \n",
       "4                  0        0   \n",
       "\n",
       "                                                  filepath  \n",
       "Subtype                                                     \n",
       "0        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  \n",
       "1        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  \n",
       "2        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  \n",
       "3        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  \n",
       "4        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  "
      ]
     },
     "execution_count": 32,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_new['filepath'] = train_new['ID'].apply(id_to_filepath)\n",
    "train_new.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 74,
   "metadata": {},
   "outputs": [],
   "source": [
    "def get_patient_data(filepath):\n",
    "    if filepath != 'DNE':\n",
    "        dcm_data = pydicom.dcmread(filepath, stop_before_pixels=True)\n",
    "        return dcm_data.PatientID, dcm_data.StudyInstanceUID, dcm_data.SeriesInstanceUID"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 75,
   "metadata": {},
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "100%|ā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆā–ˆ| 752803/752803 [17:31<00:00, 716.03it/s] \n"
     ]
    }
   ],
   "source": [
    "tqdm.pandas()\n",
    "train_new['PatientID'], train_new['StudyID'], train_new['SeriesID'] = zip(*train_new['filepath'].progress_apply(get_patient_data))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 76,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th colspan=\"6\" halign=\"left\">Label</th>\n",
       "      <th>filepath</th>\n",
       "      <th>PatientID</th>\n",
       "      <th>StudyID</th>\n",
       "      <th>SeriesID</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subtype</th>\n",
       "      <th></th>\n",
       "      <th>any</th>\n",
       "      <th>epidural</th>\n",
       "      <th>intraparenchymal</th>\n",
       "      <th>intraventricular</th>\n",
       "      <th>subarachnoid</th>\n",
       "      <th>subdural</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>ID_000012eaf</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_f15c0eee</td>\n",
       "      <td>ID_30ea2b02d4</td>\n",
       "      <td>ID_0ab5820b2a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ID_000039fa0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_eeaf99e7</td>\n",
       "      <td>ID_134d398b61</td>\n",
       "      <td>ID_5f8484c3e0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ID_00005679d</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_18f2d431</td>\n",
       "      <td>ID_b5c26cda09</td>\n",
       "      <td>ID_203cd6ec46</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ID_00008ce3c</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_ce8a3cd2</td>\n",
       "      <td>ID_974735bf79</td>\n",
       "      <td>ID_3780d48b28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ID_0000950d7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>ID_0000aee4b</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_ce5f0b6c</td>\n",
       "      <td>ID_9aad90e421</td>\n",
       "      <td>ID_1e59488a44</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>ID_0000ca2f6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_8c5a14af</td>\n",
       "      <td>ID_a84b7a0dcd</td>\n",
       "      <td>ID_d6ba679446</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>ID_0000f1657</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_df70c823</td>\n",
       "      <td>ID_04ef429610</td>\n",
       "      <td>ID_245e16180c</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>ID_000178e76</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_462abff7</td>\n",
       "      <td>ID_4fef99f0df</td>\n",
       "      <td>ID_72952d87fa</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>ID_00019828f</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_fc08e4cf</td>\n",
       "      <td>ID_ade653597d</td>\n",
       "      <td>ID_c0d8754a07</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   ID Label                                             \\\n",
       "Subtype                 any epidural intraparenchymal intraventricular   \n",
       "0        ID_000012eaf     0        0                0                0   \n",
       "1        ID_000039fa0     0        0                0                0   \n",
       "2        ID_00005679d     0        0                0                0   \n",
       "3        ID_00008ce3c     0        0                0                0   \n",
       "4        ID_0000950d7     0        0                0                0   \n",
       "5        ID_0000aee4b     0        0                0                0   \n",
       "6        ID_0000ca2f6     0        0                0                0   \n",
       "7        ID_0000f1657     0        0                0                0   \n",
       "8        ID_000178e76     0        0                0                0   \n",
       "9        ID_00019828f     0        0                0                0   \n",
       "\n",
       "                               \\\n",
       "Subtype subarachnoid subdural   \n",
       "0                  0        0   \n",
       "1                  0        0   \n",
       "2                  0        0   \n",
       "3                  0        0   \n",
       "4                  0        0   \n",
       "5                  0        0   \n",
       "6                  0        0   \n",
       "7                  0        0   \n",
       "8                  0        0   \n",
       "9                  0        0   \n",
       "\n",
       "                                                  filepath    PatientID  \\\n",
       "Subtype                                                                   \n",
       "0        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_f15c0eee   \n",
       "1        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_eeaf99e7   \n",
       "2        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_18f2d431   \n",
       "3        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_ce8a3cd2   \n",
       "4        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "5        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_ce5f0b6c   \n",
       "6        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_8c5a14af   \n",
       "7        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_df70c823   \n",
       "8        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_462abff7   \n",
       "9        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_fc08e4cf   \n",
       "\n",
       "               StudyID       SeriesID  \n",
       "Subtype                                \n",
       "0        ID_30ea2b02d4  ID_0ab5820b2a  \n",
       "1        ID_134d398b61  ID_5f8484c3e0  \n",
       "2        ID_b5c26cda09  ID_203cd6ec46  \n",
       "3        ID_974735bf79  ID_3780d48b28  \n",
       "4        ID_8881b1c4b1  ID_84296c3845  \n",
       "5        ID_9aad90e421  ID_1e59488a44  \n",
       "6        ID_a84b7a0dcd  ID_d6ba679446  \n",
       "7        ID_04ef429610  ID_245e16180c  \n",
       "8        ID_4fef99f0df  ID_72952d87fa  \n",
       "9        ID_ade653597d  ID_c0d8754a07  "
      ]
     },
     "execution_count": 76,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_new.head(10)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 77,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "752803\n",
      "18938\n",
      "21744\n",
      "21744\n"
     ]
    }
   ],
   "source": [
    "print(train_new.shape[0])\n",
    "print(len(train_new['PatientID'].unique()))\n",
    "print(len(train_new['StudyID'].unique()))\n",
    "print(len(train_new['SeriesID'].unique()))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 78,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_new.to_csv('train_new')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 80,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th colspan=\"6\" halign=\"left\">Label</th>\n",
       "      <th>filepath</th>\n",
       "      <th>PatientID</th>\n",
       "      <th>StudyID</th>\n",
       "      <th>SeriesID</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subtype</th>\n",
       "      <th></th>\n",
       "      <th>any</th>\n",
       "      <th>epidural</th>\n",
       "      <th>intraparenchymal</th>\n",
       "      <th>intraventricular</th>\n",
       "      <th>subarachnoid</th>\n",
       "      <th>subdural</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>ID_000012eaf</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_f15c0eee</td>\n",
       "      <td>ID_30ea2b02d4</td>\n",
       "      <td>ID_0ab5820b2a</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>ID_000039fa0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_eeaf99e7</td>\n",
       "      <td>ID_134d398b61</td>\n",
       "      <td>ID_5f8484c3e0</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>ID_00005679d</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_18f2d431</td>\n",
       "      <td>ID_b5c26cda09</td>\n",
       "      <td>ID_203cd6ec46</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>ID_00008ce3c</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_ce8a3cd2</td>\n",
       "      <td>ID_974735bf79</td>\n",
       "      <td>ID_3780d48b28</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ID_0000950d7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   ID Label                                             \\\n",
       "Subtype                 any epidural intraparenchymal intraventricular   \n",
       "0        ID_000012eaf     0        0                0                0   \n",
       "1        ID_000039fa0     0        0                0                0   \n",
       "2        ID_00005679d     0        0                0                0   \n",
       "3        ID_00008ce3c     0        0                0                0   \n",
       "4        ID_0000950d7     0        0                0                0   \n",
       "\n",
       "                               \\\n",
       "Subtype subarachnoid subdural   \n",
       "0                  0        0   \n",
       "1                  0        0   \n",
       "2                  0        0   \n",
       "3                  0        0   \n",
       "4                  0        0   \n",
       "\n",
       "                                                  filepath    PatientID  \\\n",
       "Subtype                                                                   \n",
       "0        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_f15c0eee   \n",
       "1        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_eeaf99e7   \n",
       "2        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_18f2d431   \n",
       "3        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_ce8a3cd2   \n",
       "4        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "\n",
       "               StudyID       SeriesID  \n",
       "Subtype                                \n",
       "0        ID_30ea2b02d4  ID_0ab5820b2a  \n",
       "1        ID_134d398b61  ID_5f8484c3e0  \n",
       "2        ID_b5c26cda09  ID_203cd6ec46  \n",
       "3        ID_974735bf79  ID_3780d48b28  \n",
       "4        ID_8881b1c4b1  ID_84296c3845  "
      ]
     },
     "execution_count": 80,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_new.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 105,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead tr th {\n",
       "        text-align: left;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr>\n",
       "      <th></th>\n",
       "      <th>ID</th>\n",
       "      <th colspan=\"6\" halign=\"left\">Label</th>\n",
       "      <th>filepath</th>\n",
       "      <th>PatientID</th>\n",
       "      <th>StudyID</th>\n",
       "      <th>SeriesID</th>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Subtype</th>\n",
       "      <th></th>\n",
       "      <th>any</th>\n",
       "      <th>epidural</th>\n",
       "      <th>intraparenchymal</th>\n",
       "      <th>intraventricular</th>\n",
       "      <th>subarachnoid</th>\n",
       "      <th>subdural</th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>ID_0000950d7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>39014</th>\n",
       "      <td>ID_0d428e6ca</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>41787</th>\n",
       "      <td>ID_0e320ef83</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>55957</th>\n",
       "      <td>ID_12fa73df7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>62110</th>\n",
       "      <td>ID_15089384e</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>78497</th>\n",
       "      <td>ID_1aa35c21e</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95072</th>\n",
       "      <td>ID_204e3c67f</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>95086</th>\n",
       "      <td>ID_204f882af</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>116802</th>\n",
       "      <td>ID_27ba4a0ed</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>168054</th>\n",
       "      <td>ID_39129ab61</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>210093</th>\n",
       "      <td>ID_4763efbcd</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>245270</th>\n",
       "      <td>ID_533fbdf73</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>277098</th>\n",
       "      <td>ID_5df494f62</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>347634</th>\n",
       "      <td>ID_75fe135a4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>369043</th>\n",
       "      <td>ID_7d436ad0f</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>377907</th>\n",
       "      <td>ID_803e590cf</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>394664</th>\n",
       "      <td>ID_85f4ec603</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>423116</th>\n",
       "      <td>ID_8f9695aef</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>432554</th>\n",
       "      <td>ID_92cafa5f6</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>436494</th>\n",
       "      <td>ID_941d491aa</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>467763</th>\n",
       "      <td>ID_9ea9667e9</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>480805</th>\n",
       "      <td>ID_a31ccb407</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>501610</th>\n",
       "      <td>ID_aa4ce8ca8</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>512286</th>\n",
       "      <td>ID_ade7354a7</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>514805</th>\n",
       "      <td>ID_aec1177e4</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>521980</th>\n",
       "      <td>ID_b1281c12f</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>550282</th>\n",
       "      <td>ID_bad859f87</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>632620</th>\n",
       "      <td>ID_d6fef64ad</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>639962</th>\n",
       "      <td>ID_d97ba5b0d</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>642046</th>\n",
       "      <td>ID_da36abbca</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>659382</th>\n",
       "      <td>ID_e024df4d1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>671395</th>\n",
       "      <td>ID_e43f11a72</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>717566</th>\n",
       "      <td>ID_f40ac6f95</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>731883</th>\n",
       "      <td>ID_f8e0c635e</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>740708</th>\n",
       "      <td>ID_fbe090828</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>747831</th>\n",
       "      <td>ID_fe506a641</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>/home/ubuntu/kaggle/rsna-intracranial-hemorrha...</td>\n",
       "      <td>ID_d278c67b</td>\n",
       "      <td>ID_8881b1c4b1</td>\n",
       "      <td>ID_84296c3845</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                   ID Label                                             \\\n",
       "Subtype                 any epidural intraparenchymal intraventricular   \n",
       "4        ID_0000950d7     0        0                0                0   \n",
       "39014    ID_0d428e6ca     0        0                0                0   \n",
       "41787    ID_0e320ef83     0        0                0                0   \n",
       "55957    ID_12fa73df7     0        0                0                0   \n",
       "62110    ID_15089384e     0        0                0                0   \n",
       "78497    ID_1aa35c21e     0        0                0                0   \n",
       "95072    ID_204e3c67f     0        0                0                0   \n",
       "95086    ID_204f882af     0        0                0                0   \n",
       "116802   ID_27ba4a0ed     1        0                1                0   \n",
       "168054   ID_39129ab61     0        0                0                0   \n",
       "210093   ID_4763efbcd     0        0                0                0   \n",
       "245270   ID_533fbdf73     0        0                0                0   \n",
       "277098   ID_5df494f62     0        0                0                0   \n",
       "347634   ID_75fe135a4     0        0                0                0   \n",
       "369043   ID_7d436ad0f     0        0                0                0   \n",
       "377907   ID_803e590cf     0        0                0                0   \n",
       "394664   ID_85f4ec603     0        0                0                0   \n",
       "423116   ID_8f9695aef     0        0                0                0   \n",
       "432554   ID_92cafa5f6     0        0                0                0   \n",
       "436494   ID_941d491aa     0        0                0                0   \n",
       "467763   ID_9ea9667e9     0        0                0                0   \n",
       "480805   ID_a31ccb407     0        0                0                0   \n",
       "501610   ID_aa4ce8ca8     0        0                0                0   \n",
       "512286   ID_ade7354a7     0        0                0                0   \n",
       "514805   ID_aec1177e4     0        0                0                0   \n",
       "521980   ID_b1281c12f     0        0                0                0   \n",
       "550282   ID_bad859f87     0        0                0                0   \n",
       "632620   ID_d6fef64ad     1        0                1                0   \n",
       "639962   ID_d97ba5b0d     0        0                0                0   \n",
       "642046   ID_da36abbca     0        0                0                0   \n",
       "659382   ID_e024df4d1     0        0                0                0   \n",
       "671395   ID_e43f11a72     0        0                0                0   \n",
       "717566   ID_f40ac6f95     1        0                1                0   \n",
       "731883   ID_f8e0c635e     0        0                0                0   \n",
       "740708   ID_fbe090828     0        0                0                0   \n",
       "747831   ID_fe506a641     0        0                0                0   \n",
       "\n",
       "                               \\\n",
       "Subtype subarachnoid subdural   \n",
       "4                  0        0   \n",
       "39014              0        0   \n",
       "41787              0        0   \n",
       "55957              0        0   \n",
       "62110              0        0   \n",
       "78497              0        0   \n",
       "95072              0        0   \n",
       "95086              0        0   \n",
       "116802             0        0   \n",
       "168054             0        0   \n",
       "210093             0        0   \n",
       "245270             0        0   \n",
       "277098             0        0   \n",
       "347634             0        0   \n",
       "369043             0        0   \n",
       "377907             0        0   \n",
       "394664             0        0   \n",
       "423116             0        0   \n",
       "432554             0        0   \n",
       "436494             0        0   \n",
       "467763             0        0   \n",
       "480805             0        0   \n",
       "501610             0        0   \n",
       "512286             0        0   \n",
       "514805             0        0   \n",
       "521980             0        0   \n",
       "550282             0        0   \n",
       "632620             0        0   \n",
       "639962             0        0   \n",
       "642046             0        0   \n",
       "659382             0        0   \n",
       "671395             0        0   \n",
       "717566             0        0   \n",
       "731883             0        0   \n",
       "740708             0        0   \n",
       "747831             0        0   \n",
       "\n",
       "                                                  filepath    PatientID  \\\n",
       "Subtype                                                                   \n",
       "4        /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "39014    /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "41787    /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "55957    /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "62110    /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "78497    /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "95072    /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "95086    /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "116802   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "168054   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "210093   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "245270   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "277098   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "347634   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "369043   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "377907   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "394664   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "423116   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "432554   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "436494   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "467763   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "480805   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "501610   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "512286   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "514805   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "521980   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "550282   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "632620   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "639962   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "642046   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "659382   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "671395   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "717566   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "731883   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "740708   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "747831   /home/ubuntu/kaggle/rsna-intracranial-hemorrha...  ID_d278c67b   \n",
       "\n",
       "               StudyID       SeriesID  \n",
       "Subtype                                \n",
       "4        ID_8881b1c4b1  ID_84296c3845  \n",
       "39014    ID_8881b1c4b1  ID_84296c3845  \n",
       "41787    ID_8881b1c4b1  ID_84296c3845  \n",
       "55957    ID_8881b1c4b1  ID_84296c3845  \n",
       "62110    ID_8881b1c4b1  ID_84296c3845  \n",
       "78497    ID_8881b1c4b1  ID_84296c3845  \n",
       "95072    ID_8881b1c4b1  ID_84296c3845  \n",
       "95086    ID_8881b1c4b1  ID_84296c3845  \n",
       "116802   ID_8881b1c4b1  ID_84296c3845  \n",
       "168054   ID_8881b1c4b1  ID_84296c3845  \n",
       "210093   ID_8881b1c4b1  ID_84296c3845  \n",
       "245270   ID_8881b1c4b1  ID_84296c3845  \n",
       "277098   ID_8881b1c4b1  ID_84296c3845  \n",
       "347634   ID_8881b1c4b1  ID_84296c3845  \n",
       "369043   ID_8881b1c4b1  ID_84296c3845  \n",
       "377907   ID_8881b1c4b1  ID_84296c3845  \n",
       "394664   ID_8881b1c4b1  ID_84296c3845  \n",
       "423116   ID_8881b1c4b1  ID_84296c3845  \n",
       "432554   ID_8881b1c4b1  ID_84296c3845  \n",
       "436494   ID_8881b1c4b1  ID_84296c3845  \n",
       "467763   ID_8881b1c4b1  ID_84296c3845  \n",
       "480805   ID_8881b1c4b1  ID_84296c3845  \n",
       "501610   ID_8881b1c4b1  ID_84296c3845  \n",
       "512286   ID_8881b1c4b1  ID_84296c3845  \n",
       "514805   ID_8881b1c4b1  ID_84296c3845  \n",
       "521980   ID_8881b1c4b1  ID_84296c3845  \n",
       "550282   ID_8881b1c4b1  ID_84296c3845  \n",
       "632620   ID_8881b1c4b1  ID_84296c3845  \n",
       "639962   ID_8881b1c4b1  ID_84296c3845  \n",
       "642046   ID_8881b1c4b1  ID_84296c3845  \n",
       "659382   ID_8881b1c4b1  ID_84296c3845  \n",
       "671395   ID_8881b1c4b1  ID_84296c3845  \n",
       "717566   ID_8881b1c4b1  ID_84296c3845  \n",
       "731883   ID_8881b1c4b1  ID_84296c3845  \n",
       "740708   ID_8881b1c4b1  ID_84296c3845  \n",
       "747831   ID_8881b1c4b1  ID_84296c3845  "
      ]
     },
     "execution_count": 105,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_new[train_new.PatientID == 'ID_d278c67b']"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 95,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True     733865\n",
       "False     18938\n",
       "Name: PatientID, dtype: int64"
      ]
     },
     "execution_count": 95,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_new.PatientID.duplicated().value_counts()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}